Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Mar 31, 2021

Rationale

Accessing the list of columns via select * from information_schema.columns (introduced in #9840) is a lot to type

See the doc for background: https://docs.google.com/document/d/12cpZUSNPqVH9Z0BBx6O8REu7TFqL-NPPAYCUPpDls1k/edit#

This is a sister PR to SHOW TABLES here: #9847

Proposal

Add support for SHOW COLUMNS FROM <table> command.

Following the MySQL syntax supported by sqlparser: https://dev.mysql.com/doc/refman/8.0/en/show-columns.html

Example Use

Setup:

echo "1,Foo,44.9" > /tmp/table.csv
echo "2,Bar,22.1" >> /tmp/table.csv
cargo run --bin datafusion-cli

Then run :

> CREATE EXTERNAL TABLE t(a int, b varchar, c float)
STORED AS CSV
LOCATION '/tmp/table.csv';

 0 rows in set. Query took 0 seconds.

> show columns from t;
+---------------+--------------+------------+-------------+-----------+-------------+
| table_catalog | table_schema | table_name | column_name | data_type | is_nullable |
+---------------+--------------+------------+-------------+-----------+-------------+
| datafusion    | public       | t          | a           | Int32     | NO          |
| datafusion    | public       | t          | b           | Utf8      | NO          |
| datafusion    | public       | t          | c           | Float32   | NO          |
+---------------+--------------+------------+-------------+-----------+-------------+
3 row in set. Query took 0 seconds.

Commentary

Note that the identifiers are case sensitive (which is a more general
problem that affects all name resolution, not just SHOW COLUMNS). Ideally this should also work:

> show columns from T;
Plan("Unknown relation for SHOW COLUMNS: T")

> select * from T;
Plan("Table or CTE with name \'T\' not found")

@github-actions
Copy link

github-actions bot commented Apr 1, 2021

Copy link
Contributor

@Dandandan Dandandan Apr 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why rev and not listing the items in columns in reverse?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking because the identifier may not have a table_catalog or table_schema

So you have to handle the case of

table_name (position 0)
table_name (position 1), schema_name (postition 0)
table_name (postition 2), schema_name (postition 1), catalog_name (position 0),

So this formulation was what I could come up with that would match them up to the information_schema column names

I am open to other ways of doing it as well if you have suggestions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are totally right 👍 I don't have a suggestion for an easier way

@Dandandan
Copy link
Contributor

Really like those features @alamb really cool additions to make DataFusion more mature for BI tools / catalogs / tools like data build tool, etc.

@alamb alamb force-pushed the alamb/show_columns branch from 618ef63 to e3f1410 Compare April 4, 2021 09:47
@alamb
Copy link
Contributor Author

alamb commented Apr 5, 2021

FYI @returnString and @seddonm1

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, I had skimmed through it but did not left any feedback :/. Thanks a lot, @alamb for this.

I can only comment on the code part which looks great; I will leave design and functionality to @andygrove , as that is farther from my comfort zone for now.

My only general comment is that we should document this somewhere, e.g. README, together with the other associated functionality (schema, catalog, etc.). It can be a separate PR dedicated to it.

Copy link
Contributor

@returnString returnString left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff! 👍

Agreed with @jorgecarleitao that we should have some kinda documentation available for these metadata operations and the catalog system more generally, I'd be happy to organise/contribute to that.

@alamb
Copy link
Contributor Author

alamb commented Apr 5, 2021

Thanks. I made a PR with some proposed docs here: #9895

@alamb
Copy link
Contributor Author

alamb commented Apr 5, 2021

I plan to wait until tomorrow to merge this in case @andygrove has any comments

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @alamb

alamb added a commit that referenced this pull request Apr 6, 2021
… Information Schema

# Rationale
As suggested by @jorgecarleitao  and @returnString  on #9866 (review) this PR adds documentation about the information schema and `SHOW TABLES` and `SHOW COLUMNS`

Note this does not document the catalog system more generally. Perhaps @returnString  can comment on that.

Closes #9895 from alamb/alamb/schema_docs

Authored-by: Andrew Lamb <[email protected]>
Signed-off-by: Andrew Lamb <[email protected]>
@alamb alamb closed this in 3e825a7 Apr 6, 2021
pachadotdev pushed a commit to pachadotdev/arrow that referenced this pull request Apr 6, 2021
… Information Schema

# Rationale
As suggested by @jorgecarleitao  and @returnString  on apache#9866 (review) this PR adds documentation about the information schema and `SHOW TABLES` and `SHOW COLUMNS`

Note this does not document the catalog system more generally. Perhaps @returnString  can comment on that.

Closes apache#9895 from alamb/alamb/schema_docs

Authored-by: Andrew Lamb <[email protected]>
Signed-off-by: Andrew Lamb <[email protected]>
pachadotdev pushed a commit to pachadotdev/arrow that referenced this pull request Apr 6, 2021
# Rationale
Accessing the list of columns via `select * from information_schema.columns` (introduced in apache#9840) is a lot to type

See the doc for background: https://docs.google.com/document/d/12cpZUSNPqVH9Z0BBx6O8REu7TFqL-NPPAYCUPpDls1k/edit#

This is a sister PR to `SHOW TABLES` here:  apache#9847

# Proposal

Add support for `SHOW COLUMNS FROM <table>` command.

Following the MySQL syntax supported by sqlparser: https://dev.mysql.com/doc/refman/8.0/en/show-columns.html

# Example Use

Setup:
```
echo "1,Foo,44.9" > /tmp/table.csv
echo "2,Bar,22.1" >> /tmp/table.csv
cargo run --bin datafusion-cli
```

Then run :

```
> CREATE EXTERNAL TABLE t(a int, b varchar, c float)
STORED AS CSV
LOCATION '/tmp/table.csv';

 0 rows in set. Query took 0 seconds.

> show columns from t;
+---------------+--------------+------------+-------------+-----------+-------------+
| table_catalog | table_schema | table_name | column_name | data_type | is_nullable |
+---------------+--------------+------------+-------------+-----------+-------------+
| datafusion    | public       | t          | a           | Int32     | NO          |
| datafusion    | public       | t          | b           | Utf8      | NO          |
| datafusion    | public       | t          | c           | Float32   | NO          |
+---------------+--------------+------------+-------------+-----------+-------------+
3 row in set. Query took 0 seconds.
```

# Commentary

Note that the identifiers are case sensitive (which is a more general
problem that affects all name resolution, not just `SHOW COLUMNS`). Ideally this should also work:

```
> show columns from T;
Plan("Unknown relation for SHOW COLUMNS: T")

> select * from T;
Plan("Table or CTE with name \'T\' not found")
```

Closes apache#9866 from alamb/alamb/show_columns

Authored-by: Andrew Lamb <[email protected]>
Signed-off-by: Andrew Lamb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants